A confidence-based active approach for semi-supervised hierarchical clustering
نویسندگان
چکیده
Semi-supervised approaches have proven to be effective in clustering tasks. They allow user input, thus improving the quality of the clustering obtained, while maintaining a controllable level of user intervention. Despite being an important class of algorithms, hierarchical clustering has been little explored in semisupervised solutions. In this report, we address the problem of semi-supervised hierarchical clustering by using an active clustering solution with cluster-level constraints. This active learning approach is based on a new concept of merge confidence in an agglomerative clustering process. When there is lower confidence in a cluster merge the user can be queried and provide a clusterlevel constraint. The proposed method was compared with a unsupervised algorithm (average-link) and a semi-supervised algorithm based on pairwise constraints. The results show that our algorithm tends to be better than the pairwise constrained algorithm and can achieve a significant improvement when compared to the unsupervised algorithm.
منابع مشابه
On the Comparison of Semi-Supervised Hierarchical Clustering Algorithms in Text Mining Tasks
Semi-supervised clustering approaches have emerged as an option for enhancing clustering results. These algorithms use external information to guide the clustering process. In particular, semi-supervised hierarchical clustering approaches have been explored in many fields in the last years. These algorithms provide efficient and personalized hierarchical overviews of datasets. To the best of th...
متن کاملExtracting Prior Knowledge from Data Distribution to Migrate from Blind to Semi-Supervised Clustering
Although many studies have been conducted to improve the clustering efficiency, most of the state-of-art schemes suffer from the lack of robustness and stability. This paper is aimed at proposing an efficient approach to elicit prior knowledge in terms of must-link and cannot-link from the estimated distribution of raw data in order to convert a blind clustering problem into a semi-supervised o...
متن کاملAn Improved Semi-Supervised Clustering Algorithm Based on Active Learning
In semi supervised clustering is one of the major tasks and aims at grouping the data objects into meaningful classes (clusters) such that the similarity of objects within clusters is maximized and the similarity of objects between clusters is minimized. The dataset sometimes may be in mixed nature that is it may consist of both numeric and categorical type of data. Naturally these two types of...
متن کاملWised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge
The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...
متن کاملWised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge
The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...
متن کامل